Near-optimal Reinforcement Learning in Factored MDPs
نویسندگان
چکیده
Any learning algorithm over Markov decision processes (MDPs) will have worst-case regret Ω( √ SAT ) where T is the elapsed time and S and A are the cardinalities of the state and action spaces. In many settings of interest S and A may be so huge that it is impossible to guarantee good performance for an arbitrary MDP on any practical timeframe T . We show that, if we know the true system can be represented as a factored MDP, we can obtain regret bounds which scale polynomially in the number of parameters of the MDP, which may be exponentially smaller than S or A. Assuming an algorithm for approximate planning and knowledge of the graphical structure of the underlying MDP, we demonstrate that posterior sampling reinforcement learning (PSRL) and an algorithm based upon optimism in the face of uncertainty (UCRL-Factored) both satisfy near-optimal regret bounds.
منابع مشابه
Efficient Reinforcement Learning in Factored MDPs
We present a provably efficient and near-optimal algorithm for reinforcement learning in Markov decision processes (MDPs) whose transition model can be factored as a dynamic Bayesian network (DBN). Our algorithm generalizes the recent E3 algorithm of Kearns and Singh, and assumes that we are given both an algorithm for approximate planning, and the graphical structure (but not the parameters) o...
متن کاملEfficient Structure Learning in Factored-State MDPs
We consider the problem of reinforcement learning in factored-state MDPs in the setting in which learning is conducted in one long trial with no resets allowed. We show how to extend existing efficient algorithms that learn the conditional probability tables of dynamic Bayesian networks (DBNs) given their structure to the case in which DBN structure is not known in advance. Our method learns th...
متن کاملTeXDYNA: Hierarchical Reinforcement Learning in Factored MDPs
Reinforcement learning is one of the main adaptive mechanisms that is both well documented in animal behaviour and giving rise to computational studies in animats and robots. In this paper, we present TeXDYNA, an algorithm designed to solve large reinforcement learning problems with unknown structure by integrating hierarchical abstraction techniques of Hierarchical Reinforcement Learning and f...
متن کاملChi-square Tests Driven Method for Learning the Structure of Factored MDPs
sdyna is a general framework designed to address large stochastic reinforcement learning (rl) problems. Unlike previous model-based methods in Factored mdps (fmdps), it incrementally learns the structure of a rl problem using supervised learning techniques. spiti is an instantiation of sdyna that uses decision trees as factored representations. First, we show that, in structured rl problems, sp...
متن کاملOptimistic Initialization and Greediness Lead to Polynomial Time Learning in Factored MDPs - Extended Version
In this paper we propose an algorithm for polynomial-time reinforcement learning in factored Markov decision processes (FMDPs). The factored optimistic initial model (FOIM) algorithm, maintains an empirical model of the FMDP in a conventional way, and always follows a greedy policy with respect to its model. The only trick of the algorithm is that the model is initialized optimistically. We pro...
متن کامل